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SOME  TOPICS  IN  STATISTICAL  INFORMATION  THEORY 


by 

S.  Kullback 
Summary 

Attention  is  focused  on  informational  properties  of 
b-sigma-algebras  of  the  fundamental  probability  space 
in  contrast  to  the  discussion  in  Information  Theory  and 
Statistics  where  attention  is  devoted  to  informational 
properties  of  statistics  that  is/  random  variables.  In 
particular,  the  integral  representation  theorem  for  dis¬ 
crimination  information  is  derived  by  methods  believed 
to  be  more  inherently  information-theoretic  than  others 
that  have  been  presented.  Monotonic  properties  of  con¬ 
ditional  discrimination  information  are  derived. 

0.  Preliminaries .'  In  [11]  attention  was  devoted  to 
informational  properties  of  statistics,  that  is,  random 
variables.  In  this  exposition  however,  the  discussion 
deals  v/ith  informational  properties  of  sub-sigma-algebras 
of  the  fundamental  probability  space.  In  particular,  we 
shall  present  a  proof  of  the  integral  representation 
theorem  of  discrimination  information  which  is  believed 
to  be  more  information-theoretic  in  approach  than  other 
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il<  •  i  i  \  4 1 1  ion:;  ot  thLo  basic  result. 

Wc  present  here  certain  notations,  lemmas,  and  results 
on  separable  sigma-algebras  which  we  shall  use  in  this  ex¬ 
position  . 

We  shall  operate  in  the  probability  space  (f2,A,P)  .  Let 
Zt  ((.,)  and  Z  (to)  be  non-negative  random  variables  such  that 
(0.1)  pt  (A)  =  /AZt  (w)dP(o)) ,  p(A)  =  /AZ(u)dP(u), 
are  probability  measures.  We  also  write  (0.1)  in  the  Radon- 
Nikodym  differential  formalism  as 
(0.2)  dpt  =  ZtdP,  dp  =  ZdP. 

(w)  and  Z  (w)  may  be  considered  as  generalized  probability 
densities.  If  we  assume  that  yt  is  absolutely  continuous 
with  respect  to  p,  that  is,  p^<<p,  then 
(0.3)  dpt  =--  Wtdp  =  WtZdP  =  ZtdP,  Wt  =  Zt/Z  a.s. 
so  that  Wt  is  a  likelihood  ratio.  We  shall  also  require 
sequences  of  the  generalized  densities,  corresponding 
probability  measures,  and  likelihood  ratios,  that  is, 

d^n  “  ztndp'  dH,  =  zBdp/  n  =  1,2,... 


(0.4)  d,U  =  Wt>du 


W  Z  dP  =  Z  dP,  W  =  Z  /Z  a.s. 

tnn  tn  t «  t  n  n 


Wc  shall  have  occasion  to  deal  with  the  properties  of 
relative  conditional  expectations  as  described  in  [13,  p.  344] . 

Let  B  be  a  sub-sigma-algebra  of  the  sigma-algebra  A, 
that  is,  BzA.  Corresponding  to  (0.2)  and  (0.3) 


(0.5)  dt.g  =  E\dPg.  d)i,s  =  EGZtdPB,  dp,e  =  EGWtdMB 
E8W,  =  EZ"j  “  (E6Z--J-)/E8Z  =  E6Z,/E8Z, 
where  Pg  is  the  restriction  of  P  to  8  defined  by  Pg(B)=P(B), 
B£B ,  and 

(0.6)  jig  (B)  =  / BZdP  =  /B(E8Z)dP8,  B^B 
(0.7)  /B(E8X)dpB  =  /BXdw,  B68 
(0.8)  K8ZX  =  E8Z*E8X  . 

We  shall  need  two  results  from  probability  theory  (see 
for  example,  (13,  p.  140,  prob.  16,  17]  which  we  state  as 
lemmas . 

Lemma  0.1.  / 1  Zt  #  -  Zt  jdP  -*■  0,  resp. 

/ 1  Zn  -  5 1  dP  -*■  0  as  n  •*  if  and  only  if 

/AZtndp  •*'  /AztdP'  resP*  /AZ»dP  *  /AZdP  as  n  *  00 

uniformly  in  a€A. 

Lemma  0.2.  If  Z.  P  Z4  resp.  Z  ^  Z,  then 

- tn  t  n 

/  Z  dP  +  /  Z  dP  resp.  /,Z  dP  -*■  /.ZdP  as  n  +  ®  uniformly  in 

A  t  H  At  Aft  A 

The  convergence  in  probability  may  be  replaced  by  al¬ 
most  sure  convergence. 

Note  that  Lemmas  0.1  and  0.2  provide  the  chain  of  im¬ 
plications 

Li 

(0.9)  p  (A)  -*■  u (A)  ,  uniformly  in  A€A»Ztt+  Z  => 

=»  Zn  5  z  p(  (A)  y(A)  ,  uniformly  in  A? A  and  a  similar 
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'•"io  with  the  subscript  t,  where 

Lj 

(0.10)  -►  Z  o  /  J  Z  -  Z|dP  -»■  0. 

B  1 

We  assemble  here  certain  results  on  separable  sigma- 
algebras  which  we  shall  need  (13J. 

(0.11)  A  separable  sigma-algebra  is  a  sigma-algebra  that 
is  generated  by  (is  minimal  over)  a  countable  class  of  sets. 
(0.12)  The  minimal  sigma-algebra  over  the  union  of  a  count¬ 
able  class  of  separable  sigma-algebras  is  also  a  separable 
sigma-algebra. 

(0.13)  The  Borel  sigma-algebra  on  the  real  line  is  separable 
(0.14)  The  inverse  image  of  a  separable  sigma-algebra  by  a 
measurable  transformation  is  a  separable  sigma-algebra. 

(0.15)  The  sub-sigma-algebra  induced  by  a  random  variable  or 

a  countable  class  of  random  variables  is  a  separable  sigma- 
algebra. 

(0.16)  A  finite  (countable)  partition  of  a  space  0  is  a 
finite  (countable)  sequence  of  sets  A]  such  that 

”  a  A,  (1  \  =  ff  i*  j . 

It  A  is  a  sigma-algebra  of  subsets  of  S)  then  the  partition 
is  measurable  A  if  A,  EA  for  all  i.  Let  E  =  (E  }  be  an 
A-measurable  partition.  The  A-measurable  partition  P={D(  } 

is  said  to  be  a  subpartition  of  E  or  finer  than  the  parti¬ 
tion  E  if  each  D,€P  is  such  that  D^E,  SE  and  we  denote  this 
by  V  ^  E  or  E  }  v . 
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(0.17)  A  sequence  of  partitions  {EM  is  said  to  be  regular 
if  each  E"  is  a  finite  partition  and 
E2  >  £»>...  . 

(0.18)  Let  1  denote  the  finite  algebra  generated  by  the 
partition  C» ,  then  corresponding  to  (0.17) 

F  1  c  E2  E  3  C  . . . 

<X> 

and  iS  an  alcjebra*  Let  £  be  the  minimal  sigma-algebra 
over  UP  ,  then  E  is  said  to  be  generated  by  the  regular 
sequence  of  partitions  (0.17) 

(0.19)  It  is  clear  that  if  a  sigma-algebra  is  generated  by 
a  regular  sequence  of  partitions  then  it  is  separable. 
Conversely,  every  separable  sigma-algebra  A  can  be  generated 
by  a  regular  sequence  of  partitions. 


1.  Introduction.  The  result  in  Corollary  3.2,  page  16 
mi  suggests  that  the  discrimination  information  in  the 
sub-sigma-algebra  B=A  generated  by  the  partition  (B,), 

A 

1  =  1,2 . .  B<6A'JiB.  *  “■  be  defined  by  (we  shall 

use  natural  logarithms) 


(1 


I<8;‘«'P)  '.I  B,  6B=A. 

Because  of  the  convexity  of  the  function  x  log  £  for 
r  in-negativo  x  ar.d  y,  and  additivity  of  the  measureSyfor 
disjoint  sets,  for  Ai,  A2?A,  AiDA2  =  0, 

(Al+A2 ) 

=  yt(Ai+A2)  In  -i-v- - - 

1  P(Ai+A2)  * 

The  property  in  (1.2)  suggests  that  the  discrimination  in¬ 
formation  in  A  be  defined  by  (cf.  Ill , (2J , (3) , (6) , (7) , (BJ , 
HO]  ,  [14]  ,  [15]) 

(1.3)  T(A;U.  ,|i)  =,  .Sup  y  „  ,t  ,  “t(A,> 

'  A,  €A  i  U,  (A,  )  fn  — <A  y. 

Where  the  sup  is  taken  over  all  possible  A-measurable  finite 
partitions  of  0.  For  convenience  hereafter  we  shall  omit 
tin  >1,  and  P  in  K8;m,  ,u)  ?ud  KA.-^.u)  unless  needed  for 
clarification. 

If  y%  is  not  absolutely  continuous  with  respect  to  y, 
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that  is,  there  exists  an  hr \  such  that  p(A)  0,  p  (a)  ji  0 

then  1(A)  -  ».  Accordingly  we  shall  assume  that  pt  <<  p. 
Note  that  1(A)  may  be  infinite  in  this  case  also.  pt  <<  M 

is  a  necessary  condition  for  1(A) <«.  (See  [11,  pp.  5,  5. 
Prob  5.7,  p.  10] 


It  also  seems  intuitively  reasonable  to  have  defined 

the  discrimination  information  in  A  by  (cf.  [11,  p.  5]) 

Z  ((1)) 

(1.4)  I  (A)  =  /  Zt  (w)  In  dP  (w)  . 

The  integral  representation  (1.4)  may  also  be  written 


as 

(1.5) 


Z, 

KA)  =  / Zt  in  -1  dP 

,  dp 

'  /  d>^"  • 


/ Wt  In  Wtdp  =  /  (In  wt)dpt  = 


Using  the  additivity  of  the  integral  and  Jensen's 
inequality  for  convex  functions  [11,  p.  16] 

(X.6)  /  w,  In  W,dp  =J.  W,  In  W,dp,  A,  f  A,  \  A,  =  !), 

U.7)  ^  w,  to  W,dp  >  /A_„t  dp  to  If  I'll  =  H  (A,  ,to  ^4  , 


(1.8)  /  WttoW,dp>J  u  (A  )(„  W 

1  =1  1  »  M(Aj  ) 


hence 


(1.9)  KA,  =  /w  toW.dp  >  ASUX  J  „t  ,A  ,  to  -441  =  T(A,  , 
where  the  sup  xs  taken  over  all  possible  A-measurable  finite 


partitions  of  tt.  A  proof  of  the  reverse  of  the  inequality 
m  (1.9)  may  be  obtained  following  the  method  in  [15,  pp.  24-25] 


or  that  in  [8] .  We  shall  not  pursue  this  matter  any  further 
at  this?  point  but  state  the  integral  representation  theorem 
Theorem  1.1.  1(A)  =  1(A),  that  is  (1.3)  and  (1.5) 

define  the  same  value  of  the  discrimination  information. 

Proofs  of  this  theorem  have  involved  martingale  theory 
(o.g.  [10])  or  the  use  in  [8]  of  the  convexity  property  in 
conjunction  with  the  Darboux- Young  approach  to  the  integra1. 
[13,  p.  143  Ex  29] . 

The  proof  to  be  presented  later  in  this  exposition  is 
believed  to  be  intrinsically  more  information-theoretic  in 
nature.  For  other  approaches  see  [2] , [7) , [14] , [15] . 

Note  that  if  in  the  probability  space  (ft,A,P) ,  A  is 
generated  by  a  finite  partition  (^  },  then  (0.1)  is 
(1.10)  p  t  (A  { )  =  7t(u)P(A|)#  p(At)  =  Z(u)P(At), 

and  both  (1.3)  and  (1.5)  yield 

\\  (A  ) 

,  <v  -jarr  '  I(A>- 

We  remark  that  instead  of  starting  with  the  probability 
space  (ft,A,P)  wt  could  have  started  with  the  measure  space 
(ft, A, A)  where  A  is  a  sigma-finite  measure  on  A.  We  assum* 
the  existence  of  the  non-negative  A-measurable  function  X (w) 

such  that 

(1.12)  P  (A)  =  /aX(<d)  d>(uj)  ,  A€  A 

is  a  probability  measure  on  A.  We  then  have  (see  [11,  p.  5]) 


(1. 11)  1(A)  =  \ 

i  - 


(1.13)  dpt  -=  Zt  dP  =  Zt  XdX  =  ftdX,  dy  =  ZdP  =  ZXdX  =  fdX[X] 

,  2t  zt  f 

(1.14)  /Zt  in  —  dP  =  / Zt  In  XdA  =  / ft  In  4  dX. 

2  *  £>.?crinlination  Information  in  a  Sub-sicrma-algebra .  The 
following  discussion  essentially  extends  some  of  the  presen¬ 
tation  in  [11,  pages  1-78].  We  shall  use  the  notation  in 
[13j  and  in  particular  properties  developed  in  [13,  Chapter 
VII,  Conditioning,  pp.  337  ffj.  In  particular  we  shall  use 
the  fact  that  Jensen's  inequality  holds  a.s.  for  conditional 
expectation  also.  Let  E  be  a  sub-sigma-algebra  of  the  sigma- 
algebra  A.  The  basic  inequalii  is  that  if  g  is  a  convex 
function  and  EX  is  finite,  then 

(2.1)  E{g(X)  }  >  g  (EX)  . 


In  the  conditional  form,  we  have 

(2.2)  E8{g(X)  }  >  g(E8X)  a.s. 

Using  the  definition  of  conditional  expectation,  (2.2)  and 
(1.4), 

(2.3)  1(A)  =  / Zt  In  4  dP  =  /E8(Z.  In  ~)  dPc 


*  /E8zt  £n 


E8Zt 


Ke  define  the  right-hand  side  of  (2.3)  as  the  discrimination 
information  in  the  sub-sigma-algebra  B  and  note  that,  using 
(0.5),  b 

(2.4)  1(B)  =  /eBZ,  In  ^-dPB  =  /E®Wt  In  EBW,dM8 

“  lUn  EIwt  TTT 

ij  (B  )  6 

=  Sup  I  u,  (Bk)  fn 
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where  the  Sup  is  taken  over  all  finite  8-measurable  parti¬ 
tions  of  ft  (see  (2.15)  and  section  1).  We  can  now  state: 
Theore.n  2.1.  If  8  is  a  sub-sigma-algebra  of  A  then 

(2.5)  1(A)  >  1(B)  . 

Note  that  the  coarsest  possible  sub-sigma-algebra  is 

8 

that  generated  by  (0,ft)  and  denoting  it  by  B0,E  °Zt  =  EZt  =  1 , 

E8°Z  =  ez  =  1  and 

(2.6)  I(B0)  =  0. 

Theorem  2.2.  1(A)  £  0  with  equality  if  and  only  if 

Wt  =  Z  /Z  =  1  a.s.  (See  (11 ,  Theorem  3.1,  p.  14]). 

From  (1.5)  and  (2.4),  using  the  result  that 

(2.7)  ;wt  Zn  E®Wtd)j  =  /E8(Wt  In  E8Wt  )dyg  =  /E®Wt  In  E®Wtdyg 


we  have 

(2.8)  1(A)  -  1(B)  =  /wt  In  Wt  dy  -  /E®Wt  In  E®Wtdyg 

=  / Wt  Zn  (Wt/E®Wt)dy 


zt/z 

EBZt  /EBZ 


dy. 


Since 

Z. 

/ Wt  dy  =  /—  dy  =  / ZtdP  =  1, 

(2.9)  /EBWt  dyg  =  /  (EB2^  /E8Z)dyg 
=  /E8ZtdPg  =  / Zt  dP  =  1, 

the  right-hand  side  of  (2.8)  is  a  discrimination  information 
value,  and  as  such  non-negative.  Let  us  define,  using  (2.8) 
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dbovj ,  and  (0.2)  and  (0.5) 

(2.10)  I  (A |  G)  ^  /Wt  in  (Wt/E8Wt)dp  =  /Zt 


=  /  dp 


t 


<3)J/dn6 


•  9 


In 


Vf\ 

z/e8z 


that  is,  I (A | 8)  is  the  conditional  discrimination  informa¬ 
tion  in  A  given  B  and  hence 


Theorem  2  ._3 .  If  B  is  a  sub-sigma-algebra  of  A 

(2.11)  i  (A)  =  I  (B)  +  I  (A |  8) . 

Theorem_2JL4.  If  B  is  a  sub-sigma-algebra  of  A,  then 

(2.12)  i(A)  =  1(8) 

if  and  only  if  I(A|8)  =  0,  that  is,  if  and  only  if 

(2.13)  Zt/Z  -  EBZt/E8Z.  a.s. 


Proof.  Apply  Theorem  2.2  and  (2.10). 

If  8  is  a  sub-sigma-algebra  of  A  and  satisfies  Theorem 
2.4,  then  we  say  that  B  is  a  sufficient  sub-sigma-algebra 
for  A.  (See  [13,  p.  346],  [11,  pp.  18-22].) 

If  B  is  generated  by  a  finite  partition,  then  on  a 
non-null  atom  B$B ,  the  conditional  expected  value  EBX  is  a 
constant  and  its  value  is 

(2.14)  EBX  =  /BXdP,  ojgB. 


Thus 

E8Z 

(2.15)  1(B)  =  / E8Z  In  -s- 

EBZ 


1 

PlB^T 


/  B  Z,  dP 


1 


in 


dpe  =  l 

/  B  Z<  dP 
ZdP 
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I'.'olc  Lli.it  in  this  case 
(2.16) 


and 


~ff-  ~  V  p jfiy  / BZdP  =  Z/(y(B)/P(B)),  u£  B 
E  Z 


(2.17)  I(A|6)  «  /  Zt  fn  - 


Z  /E8Z 


z/e8z 


dP 


,  z  /e8z 

=  1  /B  2t  £n  dP 

i  n  Z/EbZ 

,  ZtP(Bt  )M(B  ) 

1  JB4  Zt  £n  ^  CBt  )  ZP  (B,  )  dP 


r  ,  -t  Zt/u  (Bf) 

!pt(B,)  /Bj  tn  z/viB^  dp 


=  J|I  (B  )  I  (A  |  B  ) 

l 


where 

(2.18)  I  ( A  |  ) 


,  z.  .  „ 
JBt  U,  (B^T  “  Vu(B,)  dP 


-  / 


B,  UJBJ 


dMt  ^  dpt/yt  (Bf> 
dy/y  (B. ) 


IJt(B,>  1  Bt 


In 


f  zt  £n  — =■  dP  -  tn 


yt  «B,  > 

n(Bt  ) 


=  tti^t  A>,w«  ln  M.d“  - ln  \tIt 


Note  that  I ( A | B^  )  is  a  discrimination  information,  and 
(2.19)  I  ( A  |  BJ  )  >  0 
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wit  h  equality  if  and  only  if 
(2.20)  Zt/Z  »  pt  (Bj  )/p(Bj  )  ,  0)€Bt  , 

(see  [11,  Corollary  3.1,  p.  15].) 

To  obtain  another  representation  for  I(A|8)  when  B 
is  a  separable  sub-sigma-algebra  of  A  we  proceed  as  follows. 
It  follows  from  (2.18)  that 


Now  let  n (8)  denote  the  class  of  finite  B-measurable  parti¬ 
tions  of  then  (see  [8]) 

v  K  (B,  ) 

h  (B'  1  1 


.22)  inf  lut  (B i)I(A|b.  )  =  inf  {1(A)- 
H(B)  i  n  (B) 


_  r  MV 

=  1(A)  -  sup  (Bl  )  In  —  J-y 

11(B)  i  V(I\  ) 


=  1(A)  -  i(B)  =  I  (A  |  8)  . 

From  (2.22)  we  note  that  I(8|S)  =  0. 

Let  Wt  be  defined  as  in  (0.3)  and  denote  by  B  the 

t 

_  j 

class  of  sets  Wt  (B)  where  B  ranges  over  linear  Borel  sets. 


that  is 
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(2.23)  (B)  =  (w:Wt  (wKB)  . 

Ik  is  the  ininin'al  sigma-algebra  with  respect  to  which  W 

'  t  * 

is  measurable  and  8W  is  a  separable  sub-sigma-algebra  of  A. 

t 

For  convenience  we  shall  hereafter  denote  8W  by  8  .  We 

shall  now  show  that  8t  is  a  sufficient  sub-signn-algebra 
for  A. 

Theorem  2.5.  Bt  is  a  sufficient  sub-sigma-algebra 
for  A,  that  is,  1(A)  =  1(8^). 

Proof. 

g 

(2.24)  1(A)  =  / Wt  In  Wt  dp  =  In  Wt  )dpg 

>  / Ez‘wt  in  Ez  W  dp„  =  I(8t)  =  /wt  In  Wt  dp  =  1(A) 

where  we  have  used  the  fact  that  since  Wt  is  measurable  Bt 
8 

then  EZ'W,  =  Wt  [p].  Note  that  using  (0.5) 

E  Zt  /E  Z  =  Zt  /Z  , 

which  is  the  necessary  and  sufficient  condition  (2.13)  for 


Theorem  2.4. 
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3.  A^nformatlon-th  ?oretic  Approach  We  rhal  1  now  con- 
sider  an  approach  to  the  Integra!  representation  which  is 
believed  to  be  of  interest  in  that  it  is  intrinsically 
information- theoretic  in  nature. 

Suppose  that  there  is  an  increasing  sequence  of  sub- 
sigma-algebras  of  A  such  that 

(3.0)  8q  c  Bi  c  B3  c...rr  8a  c...c  8CA 

where  8  is  the  minimal  sigma-algebra  containing  yB  , 

,  *  " 
usually  denoted  by  8  |  8,  or  B  =  \fB 

Consider 


(3.1)  Je  Z  in  — „Z  dP  --  /e8Z  in 
E°p  z 


E^n  +  i  z 
E8“  Z 


dP  +  / e8Z  in 


E8z 
+  1 


so  that  since 
(3.2)  /e8Z  in 


8  E8"^ 


?, 


g 

r  8  R  E  n  +  1  Z 

JTT  =  IE  ’"  <E  2  to  - 

E  Z  Eb»Z 


)  dP 


=  / E^+1  Z  £n  — v 


E 8ft  +  l  z 


E  ‘  Z 


dP  >  0 


(recall  that  E^>  (eBZ)  =  z  and  e'U  (eB.  z) 

may  write  (3.1)  as 

(3.3)  /eBZ  to  -|B2  dP  =  /E^*.  Z  to  S^.12  dp  + 


=  E8f-  Z)  we 


E  i»  Z 


-  /E8Z  £n  -*p—  dP. 

pHl+l  y 


8. 


dP 


Hence  using  the  notation 
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n.4)  X  ( K  : Bn  )  =  /eBZ  £n  dP 

Z 

wc  have  the  chain  of  inequalities 

<3.S)  X<B,E0)  i  1(6:8.)  >....>  »  I(8:8.„,  >...>  # 

where  it  is  seen  from  (3.3)  that  1(6:8.)  .  1(8-8  )  . 

”  I<8-*>  :B->  =  °-  sin=e  the  munotonically  decreasing 

sequence  (3.5)  is  bounded  below  it  converges  and  hence 
<3.6)  1(8:6.)  -  1(6:8.,,)  ,  fn  S^li*  dP  *  „  „  _ 

hence  using  the  result  in  (12),  (3.6)  implies 

(3.7)  -  e8.Z|  dP  *  o,  n 

indeed,  by  considering  1(6:8.,.)  -  I(8:b.)  „e  can  also  get 

(3.8)  /|EB°  +  “  Z  -  S6«Z|  dP  .0  m,n  „, 

hence  the  sequence  EB»  z  is  L,  fundamental  and  there  exists 
an  XirLj  such  that 

(3.9)  / |eBc  z  -  x|  dP  -  0,  n  *  -, 


(3.10)  /eB»  zap  -  /xdP,  n  * 


that  .is 


(3.11)  J XdP  =  l 

since  /EB-zdP  =  1  for  all  n  (see  (13,  p.  157,  161]) 
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W.i.  i.hout  restricting  tho  generality  wc  may  take  X  to 
be  S-weasurable  by  the  following  argument.  Using  tho 
result  in  (13,  p.  348]  that 

Xn  S1  X  «  E8Xrj  >  E°X 
we  have 

/|E8rZ  -  X  |  dP  -  0,  n  -  »  =>  /|EBE8aZ  -  E  X  |  dP  -  0, 

n  -►  » 

B  R 

=»  / 1 E  "  Z  -  E  xjdP  ->0,  n  -*•  00 

r  B  B  R 

since  E°E  " Z  =  E  n  Z .  If  X  is  8-measurable  E  X  =  X  a.s. 

g 

We  shall  now  show  that  X  =  E  Z  a.s.  Applying  lemma 
0.1  to  (3.9)  we  have 
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(.{.12)  /A K8“  ZdP  /AXdP. 

1 A6  8k  ,  then  for  n  >  k 
(3 . i:>)  /aE8"- ZdP  -  /AE82dP 

so  t.iat 

(3.10  /AXdP  =  /  EBZdP. 

Since  (3.14)  is  true  for  ASS^  ,  it  is  true  for  A6UBk  . 

Thus  the  probability  measures  defined  by  the  integrals  in 

(3.14)  Eire  identical  on  the  field  1)8.  and  hence  by  the  ex 

k  * 

tension  theorem  [13,  p.  87]  (3.14)  holds  for  A€B.  The 
kadon-Nikodym  theorem  then  yields  from  (3.14)  that 

(3.15)  X  «  EBZ  [Pg] . 

Since 

(3.16)  / 1  EB“  Z  -  E8Z|dP  £  /|E8»Z  -  X|dP  +  /|X  -  E8Z|dP 
v/e  see  from  (3.9)  and  (3.15)  that  (cf.  [4,  pp.  319,  331]) 

(3.17)  /|E8“Z  -  E8ZjdP  ->0,  n  -►  ». 

Since,  using  theorem  2.2 

(3.18)  /EBZ  In  dP  =  0 

V 

if  and  only  if  X  =  E  Z  a.s.,  then  in  view  of  (3.15)  we 
conjecture  that  the  sequence  (3.5)  has  the  limit  zero, 
that  is, 

(3.19)  lim  I (8 : B  )  =  0 

n-xo 
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Wc  show  that  (3.19)  is  true  as  follows.  We  write 

8 

KB: 8  )  a  / eBZ  in  dP  «  /E8Z  in  E8Z  dP  -  / E8Z  in  E8"Z  dP 

"  E  "  Z  ; 


=  /EBZ  in  EBZ  dP  -  / E8nz  in  EB"Z  dP. 

We  have  shown  that  EB”  Z  +  1  E8Z  which  implies  that  EBaZ  E8Z. 


The  convergence  in  probability  implies  that  there  exists  a 
sequence  {nk}  of  integers  increasing  to  infinity  such  that 
[13,  p.  151] 

E  "kZ  a4-S*  E8Z. 

8  8 

Since  the  convex  function  EnZ  in  E  nZ  >  -1/e,  the  Fatou- 

Lebesgue  Theorem  [13,  p.  152]  yields 

8  8 

(3.20)  lim  inf  / E  "kZ  in  E  n,cZ  dP  t  / E8Z  in  E8Z  dP. 

But  the  convexity, Jensen's  inequality,  and  the  smoothing 
property  of  conditioning  [13,  p.  348,  351]  lead  to 

r  r  s.  8  8  8  B 

(3.21)  / E°Z  in  E°Z  dP  l  / E  B Z  in  E  * Z  dP  1  /E  "Z  in  E  *Z  dP 

for  all  n,  m  such  that  n>m. 

From  the  monotonic  property  in  (3.21)  combined  with 
(3.20)  we  conclude  that 


lim  / E  r‘*z  ir\  E  a*Z  dP  „  /e8Z  in  E8Z  dP 


lira  1(8:8  )  =  0. 

n  k 

Since  I(8:8n)  converges,  it  must  converge  to  the  same  limit, 
that  is,  we  have  (3.19). 


Note  that  if  we  interpret  1(6:8  )  as  a  measure  of  the 

closeness  of  the  approximation  to  the  measures  over  8  by 
the  measures  over  Gn  the  sequence  (3.5)  implies  that  the 

approximation  gets  better  as  n  gets  larger  and  in  the  limit 
the  approximation  is  exact  to  within  sets  of  measure  zero. 

A  similar  argument  ofcourse  follows  for  Zt  .  If  we 


start  with 


EBW 

.8...  „  2  t  ,  f„B..  “Z 


EBb+i  W 

(3.22)  / E°Wt  In  -g—  du  =  / E®Wt  In  * 


dp  + 


EZ»W, 


E  “  W 

^2  t 


ebw 

+  /e|w,  in  2  ‘ 


+ 1 


dp 


W. 


since 

R  f> 

E  n  +  1W  _  E  n  +  1W 

(J.2U  /e®W  in  - -  dy  =  in  - -  dy 

E«-W,  2  E«>W, 


we  can  repeat  the  preceding  argument  and  conclude  that 
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B  E7W 

(3.25)  lim  / E®W  In  -J-±  dp  =  0> 

■  »-*»  EZWt 


We  shall  now  use  these  results  to  show  that 

(3.26)  1(B)  =  /Eg Wt  £n  E*Wt  dp  =  Urn  /E®«Wt  In  E®»W  dp  = 

n  -Ko  " 

=  lim  I  (B  )  . 

n 

n  ->co 

Using  the  result  in  theorem  2.3  we  may  write 

1(3IB0)  =  I(8JS0)  +  I  (B  |  B  x ) 

I (B| B1)  =  1  (B2 1 B  a)  +  I (8  j  B2  ) 

(3.27)  ...  . 

=  I(e..1  I8,)  +  KBU  ,  ) 


or  using  the  relations  in  (3.27) 

(3.28)  I(G|60>  -KB)  -5^  KB,,,  |  )  +I(B|8,M) 

where 

g 

(3.2S)  KE|8n>  =  /  EE«t  In  -*5-L  dp. 

E/w. 

Since  the  relation  (3.25)  applied  to  (3.29)  implies 

(3.30)  I(B|8n)  ->  0  us  N  ->  oo 

we  get  from  (3.7  )  (;;ee  (2.8),  (2.10)) 

(3.31)  I(P)  -  ;  I  (B|t  +  1  |^  )  l  (I(E^  )  -  I  (E^  )  )  =  lim  I(B  ) 

*  k  .-o  *  n 

n  ->-oo 

From  (3.31)  it  is  seen  that  if  1(B) 


<  «  then  I  (B#f  x  |  B#  )  -►  0, 
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n  If  I(Bn  +  1  |  B#  )-►  I  <  «»  as  n  -►  <»,  then  by  the  Toeplitz 

lemma  [13,  p.  238]  lim  i  I(Sm)  =  I. 

n  -roo 

We  are  now  in  a  position  to  apply  (3.31)  to  prove  the 
integral  representation  theorem. 

In  the  preliminaries  it  was  seen  that  a  separable  sigma- 
algebra  can  be  generated  by  a  regular  sequence  of  partitions, 
and  using  the  notation  of  (0.18)  we  have 

(3.32)  £  \  E  . 

Accordingly,  as  a  special  case  of  (3.31)  we  have  that 

(3.33)  lim  I(En  )  =  1(E)  . 

ft  -MX, 

Recalling  (1.11),  (2.15)  and  Theorem  2.5,  we  note  that  (3.33) 

is  the  integral  representation  theorem,  that  is 

yt  (A, ) 

(3.34)  1(A)  =  Sup  l ^  (Aj  )  In  ^-y  =  T(A) 

where  the  sup  is  taken  over  all  possible  A-measurable  finite 
partitions  of  ft,  and  in  particular  if  the  8^  are  generated 

by  finite  partitions  then  (3.31)  is  the  same  as  (2.4). 

For  application  of  the  preceding  results,  particularly 
(3.31)  to  stochastic  processes  we  state  as  Lemma  3.1  a 
result  which  is  Theorem  1.6,  page  604  of  [4]. 

Lemma  3.1.  Let  A  be  a  sigma-algebra  of  w  sets,  and  let 
(x(t,u) ,  tCT)  be  a  family  of  u  functions  measurable  with 
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respect  to  A.  Let  8g  be  the  sigma-algebra  generated  by 

{x(t,w),  t€ScT}.  Suppose  that  T  is  non-denumerable.  -hen 
if  AeST  there  is  a  denumerable  subset  S  'depending  on  :.J  of 

T,  such  that  A€8g.  If  X(w)  is  an  w  function  measurable  with 

respect  to  BT,  there  is  a  denumerable  subset  S  (depending  on 

X)  such  that  X  is  measurable  with  respect  to  8 

s 

Now  let  b:(t,u>),  t€T}  be  an  arbitrary  system  of  random 

variables  defining  a  stochastic  process.  Let  B  be  the 

sigma-algebra  generated  by  the  sub-system  (x(t,w),  t€NCT} 

and  8t  the  sigma-algebra  generated  by  the  system  of  random 

variables  defining  the  stochastic  process.  We  can  now  state 
Theorem  3.1. 

(3.3:.)  1(8  )  =  sup  I  (8X ) 

N€W  N 

where  A/  is  the  class  of  all  finite  subsets  of  T. 

Proof.  Ii  t  is  countable,  then  it  is  possible  to  choose 
finite  subsets  c  N2  c  . . .  such  that  8T  is  the  smallest 

sigma-algebra  containing^  and  (3.35)  is  then  essentially 

(3.31).  if  t  is  not  countable  we  shall  use  Lemma  3.1.  if 
I(BS)  =  co,  then  I  (8T)  =  a..  Suppose  I(8T)  <  «, ,  then 

8t  Biji  Brp 

EZ  wt  E  Zt/E  2  (see  (0-5))  exists  and  is  of  course 
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measurable  with  respect  to  Br  But  according  to  Lemma  3.1 
every  function  measurable  with  respect  to  8T  is  measurable 


vith  respect  to  6g  for  at  least  one  countable  S  c  t  so  that 

(3.3S)  E28tw,  =  EZ6SW,  1M),  or  E'TZ,/E-TZ  =  ESSZ,/E8SZ  a.s. 

Since  (3.36)  is  the  necessary  and  sufficient  condition  that 
K8q)  -  I(8,j)  we  get 


(3.37)  1(8  )  =  I(B.)  =  sup  I  (Bm)  . 

r  S  N€W  N 

For  a  similar  result  see  [9]. 

4.  Monotonicity.  If  the  sigma-algebra  B  in  (3.0)  is  not 
the  sigma-algebra  A  of  the  probability  space  (Q,A,P)  then 
again  using  theorem  2.3  as  in  (3.27)  we  have 

(4.1)  1(A)  =  1(B)  +  I  (A j  8) 

(4.2)  1(A)  =  1(8  )  +  I  (A  1 8  ). 

We  can  now  derive  certain  limiting  relations  in  which  A 
plays  a  role.  From  (4.2)  and  (3.31)  we  see  that 

(4.3)  1(A)  =  lim  I(SR)  +  lim  I(A|B  ) 

®  ■+0°  n  -Xx)  B 

=  1(B)  +  lim  I(A|8n  )  , 

n  ->co 

hence  for  1(A)  <  <» 

(4.4)  i(A)  -  1(B)  -  I  (A  j  6)  =  lim  I(A|B  ). 

B  -Ko  * 

Similarly  from 

(4.5)  1(5)  =  1(8  )  +  I  (B 1 8  ) 

n  I  n 
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(4.0)  1(B)  =  lim  I(8l)  +  lim  I(8|b  ) 

n  ■>00  n  -+m  * 

so  that  if  1(B)  <  ~ 

(4.7)  1(B)  =  iim  T(B  )  »  lim  I  (8 1 B  )  =  0. 

8  -+»  D 

Note  that  if  1(A)  <  -  and  B,  c  8,  ,  then  using  (4.2)  we  have 

(4.8)  I  (A  1 6  )  -  I(A|S  )  =  1(8  )  -  1(8  )  >  0. 

"  ®  a 

Similarly  for  BL  c  j  c  j  we  have 

*  I  0 

(4.9)  I(Sn  lBk)  >  I(Ba  |Bk). 

As  a  matter  of  fact  for  c  BP  c  A  we  can  write 

if  1(A)  <“  (for  a  related  discussion  see  [8]) 

(4.10)  I  (A  |  8  2)  =  KBjBj)  +  I(A|B2) 

which  can  be  proven  either  directly  or  from  the  fact  that 

(4.11)  [1(A)  -  I  (Bj  )  ]  =  [I  (B2 )  -  I (B  x )  3  +  [I  (A)  -  I  (82 )  ] . 

Since  the  information  values  in  (4.10)  are  nonnegative  it 
follows  that 

(4.12)  I  (A| Bj)  >  l(A|B2) 

with  equality  if  and  only  if  I(B2|8  )  =0  and 

(4.13)  I(A|K  )  >  1(8  1 8  ) 

with  equality  if  and  only  if  I(A|S2)  =  0. 

Using  A  in  place  of  6  in  (3.27)  we  have  corresponding 
to  (3.28) 

(4.14)  I(A|B  )  =  1(A)  =  £  KB  |B :  )  +  I(A|B  ) 

|(p.O  *  +  1  *  k  4*  1 
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ao  that  using  (4.4) 

(4-15)  I(A)  =J0I<B«.  1 1  8„  )  +  I  (A|  8) . 

In  connection  with  the  results  of  this  section  see 
tho  axiomatic  approach  in  15] . 
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